Skip to content

feat: implement RewriteFiles operation#751

Open
WZhuo wants to merge 5 commits into
apache:mainfrom
WZhuo:rewrite_files
Open

feat: implement RewriteFiles operation#751
WZhuo wants to merge 5 commits into
apache:mainfrom
WZhuo:rewrite_files

Conversation

@WZhuo

@WZhuo WZhuo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

This PR implements the RewriteFiles operation for the Iceberg C++ library.

Changes

  • Added RewriteFiles update class in src/iceberg/update/rewrite_files.h and .cc
  • Added RewriteFiles to the type forward declarations in src/iceberg/type_fwd.h
  • Registered RewiteFiles as a merging snapshot update in src/iceberg/table.h and .cc
  • Added RewriteFiles support in transaction (src/iceberg/transaction.h and .cc)
  • Added comprehensive tests in src/iceberg/test/rewrite_files_test.cc
  • Added merge test for RewriteFiles in src/iceberg/test/merging_snapshot_update_test.cc

@WZhuo WZhuo force-pushed the rewrite_files branch 2 times, most recently from 0acf4f8 to d28ffe9 Compare June 16, 2026 11:19
Comment thread src/iceberg/table.cc
Comment thread src/iceberg/update/rewrite_files.cc Outdated
// ============================================================================
//
// TODO(RewriteDataAndDeleteFiles):
// Blocked by: RowDelta not yet ported.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, RowDelta has been merged.

/// \param table_name The name of the table
/// \param ctx The transaction context
/// \return A unique pointer to the new RewriteFiles operation
static Result<std::unique_ptr<RewriteFiles>> Make(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that some update operations return a std::unique_ptr (e.g., RowDelta), while others return a std::shared_ptr (e.g., OverwriteFiles). I'm not sure what the standard is here. Should we make them consistent? cc @wgtmac


namespace iceberg {

class RewriteFilesTest : public MinimalUpdateTestBase {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider the table format version in these tests? I handled it in MergeAppendTestBase, but the resulting code structure feels a bit awkward. I'm wondering if we should refactor UpdateTestBase to make it easier to write tests for different table format versions.

// Verifies sequence number propagation to new manifests.
//
// TODO(Failure):
// Blocked by: commit fail injection (failCommits) not yet ported.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already supported. See BindTableWithFailingCommits in merge_append_test.cc. The implementation is a bit ugly though, feel free to improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants